-
Notifications
You must be signed in to change notification settings - Fork 22
[WIP] Tag send recv callback prototyping #431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: niranda perera <[email protected]>
Signed-off-by: niranda perera <[email protected]>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
| auto [msg_available, info] = shared_resources_->get_worker()->tagProbe( | ||
| ::ucxx::Tag(static_cast<int>(tag)), UserTagMask | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we would use the new implementation from rapidsai/ucxx#458, thus we won't need to go through the receive queue twice, but we need someone to review that PR.
| EXPECT_EQ(cid, chunk_id); | ||
|
|
||
| auto data_buf = chunk.release_data_buffer(); | ||
| data_buf->wait_for_ready(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a good idea, callbacks should be lightweight because they may execute during the UCX worker progress loop, so in the case here you're blocking UCX entirely until wait_for_ready() returns.
| )); | ||
| } else if (comm->rank() == 1) { | ||
| auto recv_any_cb = [&](std::unique_ptr<Buffer> buf, Rank sender_rank) { | ||
| auto const& recv_buf = br->move_to_host_vector(std::move(buf)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember if this actually moves and waits for data to be ready, if it does it causes the same problem as mentioned above as it blocks UCX progress.
| auto data_buf = br->allocate(chunk.concat_data_size(), stream, reservation); | ||
| data_buf->wait_for_ready(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, waiting for an allocation will block the UCX progress.
No description provided.